nlp_architect.models.gnmt.utils package

Submodules

nlp_architect.models.gnmt.utils.evaluation_utils module

Utility for evaluating various tasks, e.g., translation & summarization.

nlp_architect.models.gnmt.utils.evaluation_utils.evaluate(ref_file, trans_file, metric, subword_option=None)[source]

Pick a metric and evaluate depending on task.

nlp_architect.models.gnmt.utils.iterator_utils module

For loading data into NMT models.

class nlp_architect.models.gnmt.utils.iterator_utils.BatchedInput[source]

Bases: nlp_architect.models.gnmt.utils.iterator_utils.BatchedInput

nlp_architect.models.gnmt.utils.iterator_utils.get_iterator(src_dataset, tgt_dataset, src_vocab_table, tgt_vocab_table, batch_size, sos, eos, random_seed, num_buckets, src_max_len=None, tgt_max_len=None, num_parallel_calls=4, output_buffer_size=None, skip_count=None, num_shards=1, shard_index=0, reshuffle_each_iteration=True, use_char_encode=False)[source]
nlp_architect.models.gnmt.utils.iterator_utils.get_infer_iterator(src_dataset, src_vocab_table, batch_size, eos, src_max_len=None, use_char_encode=False)[source]

nlp_architect.models.gnmt.utils.misc_utils module

Generally useful utility functions.

nlp_architect.models.gnmt.utils.misc_utils.add_summary(summary_writer, global_step, tag, value)[source]

Add a new summary to the current summary_writer. Useful to log things that are not part of the training graph, e.g., tag=BLEU.

nlp_architect.models.gnmt.utils.misc_utils.check_tensorflow_version()[source]
nlp_architect.models.gnmt.utils.misc_utils.debug_tensor(s, msg=None, summarize=10)[source]

Print the shape and value of a tensor at test time. Return a new tensor.

nlp_architect.models.gnmt.utils.misc_utils.format_bpe_text(symbols, delimiter=b'@@')[source]

Convert a sequence of bpe words into sentence.

nlp_architect.models.gnmt.utils.misc_utils.format_spm_text(symbols)[source]

Decode a text in SPM (https://github.com/google/sentencepiece) format.

nlp_architect.models.gnmt.utils.misc_utils.format_text(words)[source]

Convert a sequence words into sentence.

nlp_architect.models.gnmt.utils.misc_utils.get_config_proto(log_device_placement=False, allow_soft_placement=True, num_intra_threads=0, num_inter_threads=0)[source]
nlp_architect.models.gnmt.utils.misc_utils.load_hparams(model_dir)[source]

Load hparams from an existing model directory.

nlp_architect.models.gnmt.utils.misc_utils.maybe_parse_standard_hparams(hparams, hparams_path)[source]

Override hparams values with existing standard hparams config.

nlp_architect.models.gnmt.utils.misc_utils.print_hparams(hparams, skip_patterns=None, header=None)[source]

Print hparams, can skip keys based on pattern.

nlp_architect.models.gnmt.utils.misc_utils.print_out(s, f=None, new_line=True)[source]

Similar to print but with support to flush and output to a file.

nlp_architect.models.gnmt.utils.misc_utils.print_time(s, start_time)[source]

Take a start time, print elapsed duration, and return a new time.

nlp_architect.models.gnmt.utils.misc_utils.safe_exp(value)[source]

Exponentiation with catching of overflow error.

nlp_architect.models.gnmt.utils.misc_utils.save_hparams(out_dir, hparams)[source]

Save hparams.

nlp_architect.models.gnmt.utils.nmt_utils module

Utility functions specifically for NMT.

nlp_architect.models.gnmt.utils.nmt_utils.decode_and_evaluate(name, model, sess, trans_file, ref_file, metrics, subword_option, beam_width, tgt_eos, num_translations_per_input=1, decode=True, infer_mode='greedy')[source]

Decode a test set and compute a score according to the evaluation task.

nlp_architect.models.gnmt.utils.nmt_utils.get_translation(nmt_outputs, sent_id, tgt_eos, subword_option)[source]

Given batch decoding outputs, select a sentence and turn to text.

nlp_architect.models.gnmt.utils.standard_hparams_utils module

standard hparams utils.

nlp_architect.models.gnmt.utils.standard_hparams_utils.create_standard_hparams()[source]

nlp_architect.models.gnmt.utils.vocab_utils module

Utility to handle vocabularies.

nlp_architect.models.gnmt.utils.vocab_utils.check_vocab(vocab_file, out_dir, check_special_token=True, sos=None, eos=None, unk=None)[source]

Check if vocab_file doesn’t exist, create from corpus_file.

nlp_architect.models.gnmt.utils.vocab_utils.create_vocab_tables(src_vocab_file, tgt_vocab_file, share_vocab)[source]

Creates vocab tables for src_vocab_file and tgt_vocab_file.

nlp_architect.models.gnmt.utils.vocab_utils.load_embed_txt(embed_file)[source]

Load embed_file into a python dictionary.

Note: the embed_file should be a Glove/word2vec formatted txt file. Assuming Here is an exampe assuming embed_size=5:

the -0.071549 0.093459 0.023738 -0.090339 0.056123 to 0.57346 0.5417 -0.23477 -0.3624 0.4037 and 0.20327 0.47348 0.050877 0.002103 0.060547

For word2vec format, the first line will be: <num_words> <emb_size>.

Parameters:embed_file – file path to the embedding file.
Returns:a dictionary that maps word to vector, and the size of embedding dimensions.
nlp_architect.models.gnmt.utils.vocab_utils.load_vocab(vocab_file)[source]
nlp_architect.models.gnmt.utils.vocab_utils.tokens_to_bytes(tokens)[source]

Given a sequence of strings, map to sequence of bytes.

Parameters:tokens – A tf.string tensor
Returns:A tensor of shape words.shape + [bytes_per_word] containing byte versions of each word.

Module contents